Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Text punctuation restoration for Vietnamese speech recognition with multimodal features
Hua LAI, Tong SUN, Wenjun WANG, Zhengtao YU, Shengxiang GAO, Ling DONG
Journal of Computer Applications    2024, 44 (2): 418-423.   DOI: 10.11772/j.issn.1001-9081.2023020231
Abstract131)   HTML10)    PDF (3010KB)(58)       Save

The text sequence output by the Vietnamese speech recognition system lacks punctuation, and punctuating the recognized text can help eliminate ambiguity and make it easier to understand. However, the punctuation restoration model based on text modality faces the problem of inaccurate punctuation prediction when dealing with noisy text, as errors in phonemes often occur in Vietnamese speech recognition systems, which can destroy the semantics of the text. A Vietnamese speech recognition text punctuation restoration method that utilizes multi-modal features was proposed, guided by intonation pauses and tone changes in Vietnamese speech to correctly predict punctuation for noisy text. Specifically, Mel-Frequency Cepstral Coefficients (MFCC) were used to extract speech features, pre-trained language models were used to extract text context features, and speech and text features were fused with label attention mechanism to fuse multi-modal features, thereby enhancing the model’s ability to learn contextual information from noisy Vietnamese text. Experimental results show that compared to punctuation restoration models that extract only text features based on Transformer and BERT (Bidirectional Encoder Representations from Transformers), the proposed method improves the precision, recall, and F1 score on Vietnamese dataset by at least 10 percent points, demonstrating the effectiveness of fusing speech and text features in improving punctuation prediction accuracy for noisy Vietnamese speech recognition text.

Table and Figures | Reference | Related Articles | Metrics